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ABSTRACT 


An  important  step  in  the  optimization  of  queries  in  relational 
databases   is  minimizing   the  number  of  join  operations  used  in  the 

evaluation  of  a  query.   It  is  shown  that  three  subclasses   of  tableaux 

2 
(including   the  subclass  of  simple  tableaux)  have  0(n  )  time  equivalence 

and  minimization  algorithms.  Since  tableaux  are  nonprocedural  represen- 
tations of  relational  expressions  over  select,  project  and  join,  these 
minimzation  algorithms  can  be  used  to  minimize  the  number  of  join  opera- 
tors in  expressions  whose  tableaux  belong  to  one  of  these  subclasses. 


CR  categories:  4.33,  5.25 


Key  words  and  phrases:  relational  database,  relational  algebra,  query 
optimization,  equivalence  of  queries,  conjunctive  query,  tableau,  NP- 
complete. 


_K  Introduction 

The  relational  model  for  databases  features  two  high-level  query 
languages:  the  relational  algebra  and  the  relational  calculus  [9,10]. 
The  relational  algebra  is  a  procedural  language  that  uses  operators 
defined  on  relations,  and  hence  a  query  is  usually  translated  to  a  rela- 
tional expression  before  being  evaluated.  However,  the  efficiency  with 
which  a  query  can  be  answered  depends  on  the  relational  expression  that 
has  been  chosen  to  represent  this  query.  Consequently,  a  number  of 
papers  (e.g.,  [12,13,14,15,17,18])  have  considered  transformations  that 
reduce  the  cost  of  evaluating  a  query.  However,  these  transformations 
do  not  necessarily  produce  an  equivalent  query  of  least  cost.  Chandra 
and  Merlin  [8]  show  how  to  perform  global  optimization  on  a  large  class 
of  queries,  but  their  algorithm  is  exponential  in  the  size  of  the  query. 

The  most  commonly  used  operators  of  the  relational  algebra  are 
select,  project  and  join,  and  a  polynomial  time  algorithm  for  optimizing 
a  subclass  of  expressions  with  these  operators  is  given  in  [4,5].  This 
optimization  technique  uses  tableaux  [3]  as  a  nonprocedural  representa- 
tion of  queries.  Tableaux  are  similar  to  the  conjunctive  queries  of 
[8],  and  resemble  Zloof's  "Query-by-Example"  language  [20].  Relational 
expressions  over  select,  project,  and  join  can  be  represented  by 
tableaux  [3].  In  [8]  it  is  shown  that  every  tableau  has  an  equivalent 
minimal  tableau.  The  importance  of  this  result  follows  from  the  fact 
that  an  expression  with  a  minimum  number  of  joins  corresponds  to  a 
minimal  tableau.  A  polynomial  minimization  algorithm  for  the  class  of 
simple  tableau  is  given  in  [4]  (although  the  problem  is  NP-complete  in 
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the  general  case  [3]).  In  [5]  it  shown  how  to  obtain  in  polynomial  time 
an  optimal  expression  from  a  minimal  simple  tableau  (if  such  an  expres- 
sion exists). 

This  optimization  technique  is  machine  independent.  It  is  capable 
of  minimzing  the  number  of  join  operators  (note  that  join  is  the  most 
expensive  operator  to  implement),  eliminating  redundant  subexpressions, 
and  applying  select  and  project  as  early  as  possible.  In  a  relational 
database  system,  this  type  of  optimization  should  be  augmented  with 
machine  dependent  optimization  that  takes  into  consideration  the  size  of 
the  relations  involved,  sorted  columns,  etc. 

In  this  paper  we  describe  new  minimization  algorithms  for  two  subc- 
lasses of  tableaux.   We  also  show  how  to  improve  the  running  time  of  the 

minimzation  algorithm  for  the  class  of  simple  tableaux.   All  these  algo- 

2 
rithms  have  an  0(n  )  running  time.   It  shown  that  each  one  of  the  three 

2 
subclasses  of  tableaux  discussed  in  this  paper  also  has  an  0(n  )   time 

equivalence  algorithm.    Finally,  in  Sections  7  and  8  we  touch  upon  the 

problems  of  minimizing  tableaux   in  the  general  case,   and  obtaining 

optimal  expressions  from  minimal  tableaux. 


2.   Basic  Definitions 


2.1  The  Relational  Model 


The  relational  model  for  databases  [9]  assumes  that  the  data  is 
stored  in  tables  called  relations.   The  columns  of  a  table  correspond  to 
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attributes,  and  the  rows  to  records  or  tuples.  Each  attribute  has  an 
associated  domain  of  values.  A  tuple  is  viewed  as  a  mapping  from  the 
attributes  to  their  domains,  since  no  canonical  ordering  of  the  attri- 
butes is  needed  in  this  way.  If  r  is  a  relation  with  a  column 
corresponding  to  the  attribute  A,  and  p  is  a  tuple  in  r,  then  y(A)  is 
the  value  of  the  A-component  of  u.  In  this  paper  we  usually  denote  a 
relation  as  a  set  of  tuples. 

A  relation  scheme  is  a  set  of  attributes  labeling  the  columns  of  a 
table,  and  it  is  usually  written  as  a  string  of  attributes.  We  often 
use  the  relation  scheme  itself  as  the  name  of  the  table.  A  relation  is 
just  the  "current  value"  of  a  relation  scheme.  The  relation  is  said  to 
be  defined  on  the  set  of  attributes  of  the  relation  scheme. 


2.'2l   The  Relational  Algebra  and  Relational  Expressions 

The  relational  algebra  [9,10]  is  a  set  of  operators  defined  on 
relations.  In  this  paper  we  consider  the  operators  select,  project,  and 
join. 

Let  r  be  a  relation  defined  on  a  set  of  attributes  X,  A  an  attri- 
bute in  X  and  c  a  value  from  the  domain  of  A.  The  selection  A=c,  writ- 
ten a.      (r) ,  is 

A«c 

o.      (r)  «■  {u  |  u  is  a  tuple  in  r  and  u(A)«c> 

A  C 

Let  Y  be  a  subset  of  X,  the  projection  of  r  onto  Y,  written  ^Y(r) , 
is 
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tt  (r)  =  <M  |  p  is  a  mapping  on  Y,  and  there  is  a  v  in  r 


that  agrees  with  \i   on  Y> 


Let  r.  and  r~  be  relations  defined  on  the  relation  schemes  R.   and 


R 


2,  respectively.  The  (natural)  loin  of  r,  and  r2,  written  r.  (xj  r~,  is 
r.  |xj  r2  =  <u  |  u  is  a  mapping  on  R,  U  R2,  and  there  is  v   in  r.   that 
agrees  with  u   on  R.,  and  v  in  r~  that  agrees  with  u  on  R~} 
Note  that  the  join  includes  intersection  (when  R.  and  R»  are  the  same) 
and  cartesian  product  (when  R.  and  R^  are  disjoint)  as  special  cases. 

The  relational  algebra  includes  other  operators,  and  to  make  our 
set  of  operators  "complete"  (in  the  sense  of  Codd  [10])  we  only  need  to 
add  the  union  and  difference  operators,  and  allow  selections  involving 
arithmetic  comparisons  between  two  components  of  a  tuple. 

The  operators  of  the  relational  algebra  are  used  in  the  formulation 
of  queries  in  terms  of  relational  expressions.  A  restricted  relational 
expression  consists  of  select,  project,  and  join  as  operators,  and  rela- 
tion schemes  as  operands. 


_2'3.   Expression  Values  and  Equivalence  of  Expressions 

An  underlying  assumption  in  many  papers  (e.g.,  [1,6,7])  is  the 
existance  of  a  single  universal  relation  at  each  instant  of  time.  This 
relation  is  defined  on  the  set  of  all  the  attributes,  and  it  is  called  a 
universal  instance  or  just  an  instance.  If  I  is  an  instance  and  E  is  a 
relational  expression,  then  each  relation  scheme  R  in  E  is  assigned  the 
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relation  tt_,  (I).   The  value  of  E  with  respect  to  I,   written  v  (E),   is 

Ri  * 

computed  by  applying  operators  to  operands  in  the  following  natural  way. 

(1)  If  E  is  a  single  relation  scheme  R,  ,  then  v._(E)  -  tt_  (I). 

(2)  (a)  If  E  -  o.   (E,),  then  vT(E)  -  o   (vT(E.)). 

A=C   i  1         A"C   1   i 

(b)  If  E  -  ^(Ej),  then  v^E)    -  VVV^ 

(c)  If  E  -  Ex  M  E2,  then  v^E)  -  VjO^)  M  VjCEj). 

We  may  regard  a  relational  expression  as  a  mapping  from  instances 
to  relations,  i.e.,  the  expression  E  maps  the  instance  I  to  the  relation 
v_(E).  Two  expressions  E.  and  E„  are  equivalent  if  they  define  the  same 
mapping.  That  is,  if  for  all  instances  I,  v_(E.)  -  v_(E2).  In  [3] 
other  types  of  equivalence  are  discussed,  and  it  is  shown  that  results 
obtained  for  the  above  type  of  equivalence  also  apply  to  the  more  gen- 
eral case,  where  the  relations  assigned  to  the  relation  schemes  do  not 
necessarily  come  from  an  instance. 


3^.    Tableaux 

Tableaux  are  just  another  way  of  representing  mappings  from 
instances  to  relations.  Unlike  relational  expressions,  tableaux  are 
nonprocedural  representation  of  queries  in  exactly  the  sense  that  rela- 
tional calculus  [9,10]  Is  nonprocedural.  In  [3]  it  is  shown  that  every 
restricted  relational  expression  has  a  corresponding  tableau  that 
represents  the  same  mapping. 
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A  tableau  is  a  matrix  consisting  of  a  summary  and  a  set  of  rows. 
The  columns  of  a  tableau  correspond  to  the  attributes  of  the  universe  in 
a  fixed  order.  The  symbols  appearing  in  a  tableaux  are  chosen  from 

(1)  Distinguished  variables,  usually  denoted  by  subscripted  a's. 

(2)  Nondistinguished  variables,  usually  denoted  by  subscripted  b's. 

(3)  Constats,  which  are  drawn  from  the  domains  of  the  attributes. 

(4)  Blank. 

Each  row  may  contain  constants,  distinguished  and  nondistinguished 
variables.  The  summary  has  constants,  distinguished  variables  and 
blanks.  A  variable  cannot  appear  in  more  than  one  column,  and  a  dis- 
tinguished variable  may  appear  in  a  particular  column  only  if  it  appears 
in  the  summary.  Furthermore,  if  a  constant  or  a  distinguished  variable 
appears  in  some  column  A  of  the  summary,  then  it  must  also  appear  in 
column  A  of  at  least  one  row. 

Let  T  be  a  tableau  with  a  summary  w~  and  rows  w.,...,w  ,  and  let  S 
be  the  set  of  all  the  nonblank  symbols  in  T.  A  valuation  p  for  T  maps 
each  symbol  in  S  to  a  constant,  such  that  if  c  is  a  constant  in  S,  then 
p(c)"c.  The  valuation  p  is  extended  to  the  rows  and  summary  of  T  by 
defining  p(w  )  to  be  the  result  of  substituting  p(v)  for  all  variables  v 
in  w  . 

A  tableau  T  defines  a  mapping  from  instances  to  relations  on  its 
target  relation  scheme,  which  is  the  set  of  all  the  attributes 
corresponding  to  columns  that  have  a  nonblank  symbol  in  the  summary. 
Given  an  instance  I,  the  value  of  T,  written  T(I),  is 
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T(I)  -  <p(w0)  |  for  some  valuation  p,  we  have  p(w  )  in  I  for  l<i<n> 

Conventionally,  we  also  regard  <J>  as  a  tableau.   This  tableau 
represents  the  function  that  maps  every  instance  to  the  empty  relation. 

Example  1:  Consider  the  following  tableau. 

ABC 


T  - 


,al 

a3J 

jal 

2 

b3J 

]bl 

b2 

a3J 

1*1 

b2 

b4| 

The  summary  is  shown  first,  with  a  line  below  it,  and  integers  are  used 
as  constants. 

Tableau  T  defines  a  relation  on  the  relation  scheme  AC.  For  exam- 
ple, supoose  that  I  is  the  instance  {211,121,122). 

Consider  the  valuation  p  which  assigns  2  to  b«  and  1  to  every  other 
variable  in  T.  Under  this  valuation,  each  row  of  T  becomes  121,  which 
is  a  member  of  I.  Therefore,  pCa.a..)  =  11  is  in  T(I). 

If  p  assigns  1  to  a.,  b. ,  b~  and  b,,  and  2  to  b~  and  a~,  the  first 
and  third  rows  become  121  and  the  second  row  becomes  122;  both  are 
members  of  I,  and  so  12  is  in  T(I). 


Since  no  valuation  for  T  produces  a  tuple  other   than   11   or   12, 
T(I)  *  {11, 12).    [] 
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_3._1  Relational  Expressions  and  Tableaux 

Every  restricted  relational  expression  E  has  a  corresponding 
tableau  T  that  defines  the  same  mapping  as  E  (however,  the  converse  is 
not  true)  [3].  The  tableau  T  for  the  expression  E  can  be  constructed 
bottom  up  by  applying  the  following  rules  [3]. 

(A)   If  there  are  no  operators  in  E,  then  E  is  a  single  relation  scheme 
R.   The  tableau  T  for  E  has  one  row  and  a  summary  such  that : 
(i)   If  A  is  an  attribute  in  R,  then  in  the  column  for  A,   the 
tableau   T  has  the  same  distinguished  variable  in  the  summary 
and  row. 
(ii)  If  A  Is  not  in  R,  then  its  column  has  a  blank  in  the  summary 
and  a  nondistinguished  variable  in  the  row. 
(Bl)  Suppose  E  is  of  the  form  a       (E.),  and  we  have  constructed  T.,   the 
tableau  for  E.. 

(i)  If  the  summary  of  T.  has  blank  in  the  column  for  A,  then  the 
expression  E  has  no  meaning,  and  the  tableau  T  for  E  is  unde- 
fined. 
(ii)  If  T-  has  a  constant  c'*c  in  the  summary  column  for  A,  then 
for  any  instance  I,  v_(E)  has  only  tuples  with  c'  in  the  com- 
ponent for  A,  and  v_(E)  Is  $.  Hence,  the  tableau  for  E  is  4>. 
(iii)  If  T.  has  c  in  the  summary  column  for  A,  then  the  tableau  for 

E  is  T-. 
(iv)   If  T.  has  a  distinguished  variable  a  in  the  summary  column 
for  A,  the  tableau  T  for  E  is  constructed  by  replacing  a  by  c 
whenever  It  appears  in  T.. 
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(B2)  Suppose  E  is  of  the  form  tt  (E.)  and  T.  is  the  tableau  for  E..  Con- 
struct T  for  E  by  replacing  nonblank  symbols  by  blanks  in  the  sum- 
mary of  T.  for  those  columns  whose  attributes  are  not  in  X.  Dis- 
tinguished variables  in  those  columns  become  nondistinguished. 
(B3)  Suppose  E  is  of  the  form  E.  b<|  E-,  and  T.  and  T^  are  the  tableaux 
for  E.  and  E_,  respectively. 

(i)  If  T.  and  T~  have  some  column  in  which  their  summaries  have 
distinct  constants,  then  It  easy  to  show  that  for  all 
instances  I,  vT(E)=  <J>,  so  <(»  is  the  tableau  for  E. 
(ii)  If  no  corresponding  positions  in  the  summaries  have  distinct 
constants,  let  S.  and  S„  be  the  sets  of  symbols  of  T.  and  T_, 
respectively.  We  may  take  S.  and  S_  to  have  disjoint  sets  of 
nondistinguished  variables,  but  identical  distinguished  vari- 
ables in  corresponding  columns.  Construct  T  for  E  to  have  the 
union  of  all  the  rows  of  T.  and  T_.  The  summary  of  T  has  in  a 
given  column: 

(a)  The  constant  c  if  one  or  both  T.  and  T_  have  c  in  that 
column's  summary.  In  this  case  replace  any  distinguished 
variable  in  that  column  by  c. 

(b)  The  distinguished  symbol  a  if  (a)  does  not  apply,  but  one 
or  both  of  T.  and  T?  have  a  in  that  column's  summary. 

(c)  Blank,  otherwise. 

These  rules  can  also  be  used  to  define  the  operations  select,  pro- 
ject and  join  on  tableaux.  The  result  of  applying  any  one  of  these 
operators  to  tableaux  (not  necessarily  tableaux  derived  from  restricted 
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relational  expressions)  is  defined  to  be  the  tableau  described   in  the 
rule  for  that  operator. 

Example  2;  Consider  the  expression  tt  (tt  (AB  (xj  BC)  M  a  _(AB)) 
If  the  above  rules  are  applied  to  this  expression,  then  the  result  is 
the  tableau  of  Example  1.    [] 


^._2  Equivalence  of  Tableaux 

Two  tableaux  T  and  T2  are  equivalent,  written  T.  =  T-,  if  for  all 
instances  I,  T.(I)  =  T_(I).  We  say  that  T.  is  contained  in  T_,  written 
Tx  C^  T2,  if  for  all  I,  T^I)  C  T2(I). 

Let  T.  and  T~  be  tableaux  with  the  same  target  relation  scheme,  and 
let  S.  and  S_  be  the  sets  of  symbols  of  T.  and  T_,  respectively.  A 
homomorphism  is  a  mapping  £:S.-»-S?  such  that: 

(a)  If  c  is  a  constant,  then  £(c)=c 

(b)  If  s  is  the  summary  of  T.,  then  £(s)  is  the  summary  of  T_. 

(c)  If  w  is  any  row  of  T.,  then  £(w)  is  a  row  of  T„. 
The  following  theorem  is  proved  in  [3,8]. 

Theorem  1;  T_  (L,  T.  if  and  only  if  T.  and  T_  have  the  same  target 
relation  scheme,  and  there  is  a  homomorphism  £:S.->-S„. 

By  condition  (c) ,  a  homomorphism  £  corresponds  to  a  mapping  8  from 
the  rows  of  T.  to  the  rows  of  T_.  The  mapping  6  is  called  a  containment 
mapping,  and  it  satisfies  the  following  conditions. 
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(1)  If  row  w  of  T.  has  a  constant  in  some  column  A,  then  9(w)   has  the 
same  constant  in  column  A. 

(2)  If  row  w  of  T.  has  a  distinguished  variable  in  column  A,   then  9(w) 
has  a  distinguished  variable  in  column  A. 

(3)  If  rows  w  and  v  have  the  same  nondistinguished  variable  in  column  A, 
then  rows  8(w)  and  9(v)  have  the  same  symbol  in  column  A. 

Corollary  2;  [3]  Tableaux  T.  and  T„  are  equivalent  if  and  only  if 
they  have  identical  summaries  up  to  renaming  of  distinguished  variables, 
and  there  are  containment  mappings  in  both  directions. 

A  tableau  T  is  minimal  if  T  is  not  equivalent  to  any  tableau  with 
fewer  rows.  Note  that  if  T  comes  from  an  expression  E,  then  the  number 
of  joins  in  E  is  one  less  than  the  number  of  rows  in  T.  Thus,  if  T  is 
minimal,  it  corresponds  to  an  expression  with  a  minimum  number  of  joins. 
For  every  tableau  T,  there  Is  a  unique  (up  to  renaming  of  variables) 
tableau  T' ,   such  that  T  =  T'  and  T'  is  minimal  [8].   Furthermore,  the 

minimal  tableau  equivalent  to  T  can  be  obtained  by  deleting  some  of  the 

(2) 
rows  of  T.   The  core    of  T  is  the  set  of  all  the  rows  that  belong  to 

the  minimal  tableau  obtained  by  deleting  redundant  rows  of  T.   A  folding 

Is  a  containment  mapping  from  the  rows  of  a  tableau  T  to  the  rows  of  the 

core  of  T,  such  that  every  row  in  the  core  of  T  is  mapped  to  itself.   It 

can  be  shown  that  every  tableau  T  has  a  folding  [8].   Note  that  a 


(1)  In  this  paper  we  consider  only  equivalence  (and  not  proper  contain- 
ment) of  tableaux,  and  therefore  the  definition  of  a  containment 
mapping  is  more  restricted  than  the  original  definition  given  in 
[3]. 

(2)  A  tableau  T  might  have  two  different  cores.  However,  they  are  the 
same  up  to  renaming  of  variabls.  Therefore,  we  usually  speak  about 
the  core  of  T. 
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homomorphisra  that  corresponds  to  a  folding  maps  every  variable  in  the 
core  of  T  to  itself.  The  following  corollary  is  an  immediate  conse- 
quence of  the  results  stated  so  far. 

Corollary  3;  Let  T.  and  T_  be  tableaux  with  the  same  summary.  If 
the  rows  of  T\  are  a  subset  of  the  rows  of  T_,  then 

(1)  T2  C.J.  Tr  and 

(2)  T2  =  T.  if  and  only  if  the  core  of  T-  is  contained  in  T.. 

Let  w  and  x  be  rows  of  tableaux  over  the  same  set  of  attributes. 
Row  w  covers  row  x,  written  x  <  w,  if  for  all  columns  A, 

(1)  if  x  has  a  constant  in  column  A,  then  w  has  the  same  constant  in 
column  A,  and 

(2)  if  x  has  a  distinguished  variable  in  column  A,  then  w  has  a  dis- 
tinguished variable  in  column  A. 

Note  that  if  x  is  mapped  to  w,  and  x  <  w,  then  the  first  two  conditions 
of  a  containment  mapping  are  satisfied.  Let  R  and  S  be  sets  of  rows 
over  the  same  set  of  attributes.  The  set  S  covers  the  set  R,  written 
R  <  S,  if  every  row  of  R  is  covered  by  some  row  of  S. 

A  symbol  (i.e.,  a  variable  or  a  constant)  of  a  tableau  T  is 
repeated  in  some  column  A,  if  it  appears  in  that  column  in  more  than  one 
row.   A  tableau  T  is  simple  if  whenever  T  has  a  repeated  nodistinguished 

variable  b  in  some  column  A,  then  b  is  the  only  repeated  symbol  in  that 

3 
column.   The  class  of  simple  tableaux  has  an  0(n  )  equivalence  algorithm 

4 
[3],   and  an  0(n  )  minimzation  algorithm  [4].   Other  equivalence  algo- 
rithms can  be  obtained  from  the  containment  algorithms  of   [16].    That 


-  13  - 

2 
is,  tesing  whether  T«  =  T.  can  be  done  in  0(n  )   in  the  following  two 

cases : 

(1)  Both  T.  and  T„  have  at  most  one  repeated  nondistinguished  variable 
in  each  row. 

(2)  Every  row  of  T.  is  covered  by  at  most  two  rows  of  T^,  and  every  row 
of  T«  is  covered  by  at  most  two  rows  of  T.. 


4^  Polynomial  Equivalence  Algorithms 

In  this  section  we  consider  the  following  classes  of  tableaux. 

(1)  The  class  of  all  tableaux  T,  such  that  T  has  at  most  one  repeated 
nondistinguished  variable  in  each  row. 

(2)  The  class  of  all  tableaux  T,  such  that  every  row  of  T  is  covered  by 
at  most  one  row  besides  itself. 

Note  that  deciding  whether  a  tableau  belongs  to  Class  1   (or  whether  a 

tableau  is  simple)   requires  0(n)   time.   However,  deciding  whether  a 

2 
tableau  belongs  to  Class  2  requires  0(n  ). 

2 
Theorem  4:  Each  one  of  the  above  classes  has  an  0(n  )   time 

equivalence  algorithm. 

Proof:  It  follows  immediately  from  the  results  of  [16]  that  the 
theorem  is  true  for  Class  (1).  Suppose  that  both  T  and  T9  are  tableaux 
that  belong  to  Class  2.  We  say  that  two  rows  w  and  x  are  equivalent,  if 
w  covers  x  and  x  covers  w.  Consider  the  algorithm  of  Figure  1.  This 
algorithm  tests  whether  T   =  T  .   Obviously,   if   T   =  T.   and  T.  E  1 
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(1)  let  T.  be  the  result  of  removing  from  T.  every  row 
that  Is  not  equivalent  to  some  row  of  T~; 

(2)  let  T2  be  the  result  of  removing  from  T_  every  row 
that  Is  not  equivalent  to  some  row  of  T. ; 

(3)  If  T,  £   T.  then  return  false; 


(4)     if  T.  %   T2  then  rejturn  false; 


(5)     If  T.  =  T.  then  return  true  else  return  false; 


end 


Figure  1 

then  T.  =  T2  if  and  only  if  T.  =  T-.   Thus,  we  have  to  show  that  T.   and 
T2  cannot  be  equivalent  if  either  T.  %   T,  or  T2  %  1„. 

f  r 

Suppose  that  T.  %  T. .  Since  the  rows  of  T.  are  a  subset  of  the 
rows  of  T.,  it  follows  that  the  core  of  T,  contains  a  row  w  which  is  not 
in  T..  By  the  construction  of  T.,  no  row  of  T?  is  equivalent  to  w.  But 
equivalent  tableaux  have  cores  which  are  the  same  (up  to  renaming  of 
variables).  That  is,  every  row  in  the  core  of  one  tableau  has  an 
equivalent  row  in  the  core  of  the  other  tableau.  Therefore,  T.  and  T« 
cannot  be  equivalent.   Similarly,  if  T2  %  T-t    then  T.  %   T_. 

Obviously,  for  i  e  (1,2),  each  row  of  T  is  covered  by  at  most  two 

rows  of  T  ,   and  each  row  of  T  is  covered  by  at  most  two  rows  of  T  . 

Suppose  that  a  row  w  of  T.  is  covered  by  more  than  two  rows  of  T_.  Let 

x  be  a  row  of  T«  that  is  equivalent  to  w.   Row  x  is  covered  by  every  row 
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of  T„  that  covers  w.   Therefore,  x  Is  covered  by  at  least  two  rows  of  T2 
besides  itself.   This  contradiction  implies  that  each  row  of  T.  is 

r  r 

covered  by  at  most  two  rows  of  T-.   Similarly,  each  row  of  T.  is  covered 
by  at  most  two  rows  of  T.. 

It  follows  that  testing  equivalence  in  lines  (3)-(5)   can  be  done 

using  the  algorithm  of   [16].    Each  application  of  this  algorithm 

2  2 

requires  0(n  )  time.   Lines  (1)  and  (2)  can  be  executed  in  0(n  )   time 

and,   therefore,   the  algorithm  of   Figure  1  has  a  time  complexity  of 

0(n2).    [] 


5_.  Obtaining  Minimization  Algorithms  from  Equivalence  Algorithms 

Let  S  be  a  class  of  tableaux.  We  say  that  S  is  closed  under  row 
deletion  if  whenever  a  tableau  T  is  in  S,  and  T'  is  obtained  by  deleting 
some  of  the  rows  of  T,  then  T'  is  also  in  S. 

Theorem  5;  Let  S  be  a  class  of  tableaux  closed  under  row  deletion. 
If  there  is  an  equivalence  algorithm  for  S  that  runs  in  F(n)  time 
(F(n)  >  en  for  some  constant  c) ,  then  there  is  a  minimization  algorithm 
for  S  that  runs  in  nF(n)  time. 

Proof:  Figure  2  describes  a  minimization  algorithm  for  S.  The 
input  for  this  algorithm  is  a  tableau  T  of  S.  We  assume  that  the 
numbers  l,2,...,r  correspond  to  the  rows  of  T.  The  algorithm  is  based 
on  the  ability  to  test  equivalence  of  tableaux  from  S.  Note  that  the 
equivalence  test  in  line  (3)  can  be  replaced  with  the  containment  test 
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T'  C/p  T»  because  the  rows  of  T'  are  a  subset  of  the  rows  of  T  and, 
hence,  T  (L,  T' . 

Let  T  be  the  value  of  T  after  i  iterations  through  the  loop  of 
lines  (l)-(3).  Since  T  is  assigned  a  new  value  in  line  (3)  only  if  the 
new  value  is  equivalent  to  the  old  value,  T  (i.e.,  the  tableau  returned 
by  the  algorithm)  is  equivalent  to  T~.  Suppose  that  T  is  not  minimal. 
Therefore,  there  is  a  row  j  in  T  that  does  not  belong  to  the  core  of 
T  .  Let  T  be  the  result  of  deleting  row  j  from  T  .  Since  the  core  of 
T  is  not  changed  as  a  result  of  this  deletion,   T  =   T   and,   hence, 

r    0 

Since  row  j  has  not  been  deleted  by  the  algorithm,  it  must  be  in 

T  .   Let  T  be  the  result  of  removing  row  j  from  T  .   It  follows  that  T 

is  not  equivalent  to  Tfi  (otherwise  row  j  cannot  appear  in  T  ) .   But  the 

rows  of  T  are  a  subset  of  the  rows  of  T.,  and  the  rows  of  T.  are  a  sub- 
r  j  j 

set  of  the  rows  of  TQ  and,  therefore,  TQ  (L,  T  .£_  T  .   Since  T  =  TQ,  it 
follows  that  T  =   TQ.   This  contradiction  implies  that  T  is  minimal. 

In  each  pass  through  the  main  loop,  line  (2)  requires  0(n)  time  and 
line  (3)  requires  F(n)  time.  The  loop  is  repeated  r  times  to  give  a 
time  complexity  of  nF(n)  (since  r  <  n) .    [] 


_6 .  Polynomial  Minimization  Algorithms 

The  classes  of  tableaux  described  in  Section  4,  and  the  class  of 
simple  tableux  have  polynomial  time  equivalence  algorithms.   Each  one  of 
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begin 

(1)  ~   for  i  :■  1  to  r  do 

begin 

(2)  Let  T'  be  the  result  of  deleting  row  i  from  T; 

(3)  if  T  =  T'  then  T  :-  T'; 
end; 

(4)  return  T; 
end 


Figure  2 

these  classes  is  closed  under  row  deletion  and,  therefore,  Theorem  5  can 
be  applied  to  obtain  polynomial  minimization  algorithms  for  these 
classes.  However,  the  algorithms  obtained  by  applying  Theorem  5  are  not 
the  most  efficient  minimization  algorithms  for  these  classes.   For  each 

one  of  these  classes  there  is  a  minimization  algorithm  with  a  time  com- 

2 
plexity  0(n  ).  These  minimization  algorithms  are  given  in  the  following 

sections . 


j>.^  Regular  Tableaux 


In  this  section  we  will  show  that  a  tableau,   in  which  a  folding 

does  not  eliminate  any  repeated  nondistinguished  variable,  can  be  minim- 

2 
ized  in  0(n  )  time.   This  fact  is  used  in  developing   the  minimization 

algorithms  of  the  following  sections. 


Let  b  a  repeated  nondistinguished  variable  of  a  tableau  T.  The 
variabel  b  is  essential  if  It  appears  in  every  core  of  T.  A  tableau  T 
is  regular  if  all  repeated  nondistinguished  variables  of  T  are  essen- 
tial. 
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begin 

/*  The  Input  is  a  tableau  T.  */ 

/*  Initially  all  the  rows  of  T  are  marked  "unconsidered".  */ 

(1)  K!lil&  there  is  a  row  w  marked  "unconsidered"  do 

begin 

(2)  mark  w  "considered"; 

(3)  for  every  row  x  other  than  w  do 

(4)  if  in  whatever  column  x  and  w  disagree, 

x  has  a  nondistinguished  variable  that  appears 
nowhere  else  in  T  then  delete  x  from  T; 
end; 

(5)  return  T; 
end 

Figure  3 


Lemma  6;  The  algorithm  of  Figure  3  returns  a  tableau  T  equivalent 

2  — 

to  T  in  0(n  )  time.   Furthermore,  if  T  is  regular,  then  T  is  minimal. 


Proof:  Consider  line  (4)  of  the  algorithm.  Row  x  is  deleted  if 
there  is  a  row  w  in  T,  such  that  for  all  columns  A,  if  x  and  w  disagree 
in  column  A,  then  x  has  a  nondistinguished  variable  that  appears  nowhere 
else  in  T.  By  Corollary  2  in  [3],  the  tableau  obtained  by  deleting  row 
x  from  T  is  equivalent  to  T.  Thus,  T  (the  tableau  returned  by  the  algo- 
rithm) is  equivalent  to  T.  As  for  the  running  time  of  this  algorithm, 
let  T  have  r  rows  and  t  columns.  The  cost  of  executing  line  (4)  once  is 
0(t).    In  each  iteration  of  the  outer  loop,  the  inner  loop  is  executed 

at  most  r  times.   The  outer  loop  is  executed  no  more  than  r   times. 

2 
Thus,  the  total  cost  of  line  (4)  is  0(r  t).   Every  other  line  has  a  con- 

2 
stant  cost,  and  is  executed  no  more  than  r  times.   Since  both  rt  and  r 

2 
are  smaller  than  n,  the  algorithm  has  a  0(n  )  running  time. 

Suppose  that  T  is  regular.  Every  repeated  nondistinguished  vari- 
able of  T  is  essential  and,  therefore,  must  appear  in  the  core  of  T. 
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Consider  a  folding  from  T  onto  its  core.  This  folding  maps  every  row  in 
the  core  of  T  to  itself  and,  therefore,  the  corresponding  homomorphism 
maps  every  repeated  nondistinguished  variable  of  T  to  itself.  Suppose 
that  row  x  of  T  is  mapped  to  some  other  row  w  in  the  core  of  T.  Since 
each  repeated  nondistinguished  variable  is  mapped  to  itself,  it  follows 
that  in  whatever  column  x  and  w  disagree,  x  has  a  nondistinguished  vari- 
able that  appears  nowhere  else  in  T.  Therefore,  x  is  deleted  during  the 
execution  of  the  above  algorithm.  Since  this  is  true  for  every  x  which 
is  not  in  the  core  of  T,  T  is  minimal.    [] 

Each  one  of  the  following  algorithms  for  minimizing  a  tableau  T  has 
two  steps.  In  the  first  step  some  rows  of  T  are  deleted  in  order  to 
obtain  an  equivalent  regular  tableau  T.  In  the  second  step  the  algo- 
rithm for  minimizing  regular  tableaux  is  applied  to  T. 


§^.2   Minimizing  Tableaux  of  Class  1_ 

Let  T  be  a  tableau  that  has  at  most  one  repeated  nondistinguished 
variable  in  each  row.  For  each  repeated  nondistinguished  variable  b, 
let  W(b)  be  the  set  of  all  the  rows  that  contain  b.  Suppose  that  some 
repeated  nondistinguished  variable  b~  is  not  essential,  and  let  A  be  the 
column  in  which  b~  appears.  Let  6  be  a  folding  from  the  rows  of  T  to 
Its  core.  Obviously,  all  the  rows  of  9(W(bn))  have  the  same  symbol  d 
(d  *  b)  in  column  A,  and  W(bQ)  is  covered  by  9(W(b0)).  The  following 
lemma  shows  that  these  conditions  are  also  sufficient  for  the  elimina- 
tion of  b~. 
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begin 

7*  The  input  is  a  tableau  T  that  belongs  to  Class  1.  */ 

/*  Initially  all  the  variables  of  T  are  marked  "unconsidered".  */ 

(1)  KlliiS,  there  is  a  repeated  nondistinguished 

variable  b  marked  "unconsidered"  do 
begin 

(2)  mark  b  "considered"; 

(3)  let  A  be  the  column  in  which  b  appears ; 

(4)  for  each  symbol  s  (s  *  b)  in  column  A  do 

begin 

(5)  let  S  be  the  set  of  all  the  rows  of  T 

that  have  the  symbol  s  in  column  A; 

(6)  if  W(b)  <  S  then  delete  W(b)  from  T; 

end; 

end; 

(7)  return  T; 


end 


Figure  4 


Lemma  7 ;  Suppose  that  ^  is  a  mapping  from  T  to  itself  such  that  for 
all  x  t   W(b-.),  Kx)  ■  x.   Then  ^  is  a  containment  mapping  if  and  only  if 

(1)  all  the  rows  of  <KW(bn))  have  the  same  symbol  in  column  A,  and 

(2)  for  all  x  e  W(b_),  x  is  covered  by  Kx)  • 

Proof;  Obviously,  conditions  (1)  and  (2)  are  satisfied  if  ^  is  a 
containment  mapping.  Conversely,  if  ^  satisfies  condition  (2),  then  ^ 
satisfies  the  first  two  conditions  of  a  containment  mapping.  If  two 
rows  x  and  w  of  T  have  the  same  nondistinguished  variable  b  in  some 
column  B,  then  either  both  x  and  w  are  in  W(bn)  and  b  is  b~,  or  both  x 
and  w  are  not  in  W(b~).  In  either  case,  ^(x)  and  *Kw)  have  the  same 
symbol  in  column  B.   Thus,  ^  is  a  containment  mapping.    [] 

By  Lemma  7,  if  b~  is  not  essential,  then  we  can  always  delete  all 
the  rows  containing  b«  by  finding  a  set  of  rows  S,  such  that  S  covers 
W(bn)  and  all  the  rows  of  S  have  the  same  symbol  in  column  A.   The  algo- 
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rithm  of  Figure  4  computes  a  regular  tableau  T'  equivalent  to  T  in  this 
way. 

Theorem  8:  A  tableau  T  that  has  at  most  one  repeated  nondis- 

2 
tinguished  variable  in  each  row  can  be  minimized  in  0(n  )  time. 

Proof;  Supposet  that  T  has  r  rows  and  t  columns.  Consider  the 
algorithm  of  Figure  4.  Lines  (l)-(3)  have  a  total  cost  of  no  more  than 
0(n).   In  each  iteration  of  the  outer  loop,  the  cost  of  lines   (4)  and 

(5)  is  no  more  than  0(r) .  The  outer  loop  is  executed  at  most  r  times  to 

2 
give  a  total  cost  of  no  more  than  0(r  ).   The  cost  of   line   (6)   is 

assigned  to  the  rows  of  T  as  follows,  whenever  a  row  w  of  W(b)  is  com- 
pared with  some  other  row  of  S,  the  cost  of  this  comparison  (i.e.,  0(t)) 
is  assigned  to  w.   Row  w  is  compared  with  at  most  r  rows,  and  the  total 

cost  assigned  to  w  is  no  more  than  0(rt) .   Since  each  row  of  T  appears 

2 
at  most  in  one  of  the  sets  W(b) ,  the  total  cost  of  line  (6)  is  0(r  t) . 

Both  rt  and  r  are  smaller  than  n  and,  therefore,   the  algorithm  has  a 

2 
running   time  of  0(n  ).   Since  the  output  of  this  algorithm  is  a  regular 

2 
tableau,  T  can  be  minimized  in  time  0(n  )    [] 


_6._3  Minimizing  Tableaux  of  Class  2_ 

Let  T  be  a  tableau  such  that  every  row  of  T  is  covered  by  at  most 
one  row  besides  itself.   For  each  row  w  of  T,  let  c(w)  be  the  other  row 

that  covers  w  (if  no  other  row  of  T  covers  w,  then  c(w)=w) .   Note  that 

2 
the  function  c   can  be  computed   in  0(n  )  time.   Our  goal  is  to  find 
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whether  a  repeated  nondistinguished  variable  b~  is  essential. 

Let  W(b0)  the  set  of  all  the  rows  containing  b-..  Suppose  that  bn 
is  not  essential,  and  let  f  be  a  folding  of  T  that  eliminates  b~.  It 
follows  that  for  all  rows  w  in  W(b-.),  w  is  mapped  to  c(w) .  We  can  start 
to  compute  the  homomorphism  h  corresponding  to  ty   as  follows. 

(1)  If  a  nondistinguished  variable  b  appears  in  some  row  w  of  W(bft), 
then  h(b)  is  the  symbol  appearing  in  column  A  of  c(w) . 

(2)  If  b  appears  in  some  row  c(w),  where  w  is  a  row  of  W(bn),   then 
h(b)  -  b. 

The  first  rule  follows  from  the  fact  that  w  is  mapped  to  c(w) .  The 
second  rule  is  valid,  since  c(w)  is  in  the  image  of  a  foldoing  and, 
hence,  c(w)  must  be  mapped  to  itself. 

Suppose  that  by  applying  these  rules  to  all  the  variables  appearing 
in  w  and  c(w) ,  for  all  w  in  W(b0),  a  contradiction  is  derived.  Then  a 
folding  that  eliminates  b~  does  not  exist.  Similarly,  if  the  rules 
imply  that  h(b0)  ■  bn,  then  b~  is  essential. 

If  the  rules  do  not  Imply  that  b0  is  essential,  then  at  least  we 
can  use  them  to  find  some  other  rows  that  do  not  belong  to  the  core  of  T 
(i.e.,  the  image  of  ty) .  That  is,  if  h  has  already  been  computed  for  a 
nondistinguished  variable  b,  and  h(b)  *  b,  then  every  row  w  that  con- 
tains b  is  mapped  to  c(w).  Thus,  let  M  be  a  set  that  initially  is  equal 
to  W(bn).  As  we  discover  more  rows  that  are  not  in  the  core  of  T,  these 
rows  are  added  to  M.  Rules  (1)  and  (2)  for  computing  h  can  be  applied 
to  all   the  rows  of  M,  instead  of  only  to  the  rows  of  W(b0).   If  a  con- 
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tradiction  is  derived  during  this  process  (or  h(b~)  ■  bn)»  then  ty  cannot 
be  a  folding  and,  hence,  b~  is  essential.  If  no  contradiction  is 
derived,  let  M  be  the  value  of  M  when  no  more  rows  can  be  added  to  M, 
and  neither  Rule  (1)  nor  Rule  (2)  can  be  applied  to  rows  of  M.  Define  a 
mapping  9  as  follows: 

9(w)  =  c(w)     if  w  e  M 

8(w)  *=  w        if  w  i   M 

Lemma  9:  The  mapping  6  is  a  containment  mapping,  and  no  row  of  M  is 
in  the  image  of  9  (provided  that  h(b-.)  *  b~). 

Proof:  For  all  rows  w  of  T,  c(w)  covers  w  and,  hence,  9  satisfies 
the  first  two  conditions  of  a  containment  mapping.  Suppose  that  rows  w 
and  x  of  T  have  the  same  nondistinguished  variable  b  in  some  column  A. 
If  both  w  and  x  belong  to  M,  then  9(w)  and  9(x)  have  the  same  symbol  in 
column  A  (because  the  rules  for  computing  h  do  not  imply  a  contradic- 
tion) .  If  neither  w  nor  x  is  in  M,  then  obviously  9(w)  and  8(x)  have 
the  same  symbol  in  column  A.  Suppose  that  w  is  in  M  but  x  is  not. 
Therefore,  h(b)  =  b  (otherwise  all  the  rows  containing  b  should  be  in  M) 
and,  hence,  9(w)  and  9(x)  have  b  in  column  A.  If  x  is  in  M  but  w  is 
not,  then  we  get  a  similar  result.  Thus,  9  satisfies  also  the  third 
condition  of  a  containment  mapping. 

Suppose  that  a  row  w  of  M  is  mapped  to  some  other  row  u  of  M. 
Since  u  is  in  M,  it  must  have  some  repeated  nondistinguished  variable  b 
such  that  h(b)  *  b.  But  since  w  is  mapped  to  u,  Rule  (2)  implies  that 
h(b)  =  b.   However,  it  is  assumed  that  no  contradiction  has  occured  and, 
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hence,  w  cannot  be  mapped  to  u.    [] 

It  follows  that  If  the  rules  do  not  imply  a  contradiction  and 
h(bg)  *  Dq,  then  all  the  rows  of  M  (and  hence  all  the  rows  containing 
b~)  can  be  eliminated  from  T.  If  a  contradiction  is  derived,  then  bn  is 
essential.  Thus,  by  repeating  this  process  for  every  nondistinguished 
variable  in  T,  we  obtain  a  regular  tableau  T  equivalent  to  T. 

The  procedure  FOLD(b,T)  decides  whether  a  nondistinguished  variable 
b  is  essential.  If  b  is  essential,  then  FOLD(b,T)  returns  the  empty 
set.  If  b  is  not  essential,  then  FOLD(b,T)  returns  a  set  of  rows  (con- 
taining all  the  rows  having  b)  that  can  be  eliminated  from  T.  The  com- 
plete algorithm  is  given  in  Figure  5.  It  is  assumed  that  the  variables 
of  T  are  represented  by  the  numbers  1,2, ...,m.  The  values  of  the 
homomorphism  computed  by  the  procedure  FOLD(b,T)  are  stored  in  the  array 
h;  initially  all  the  entries  in  this  array  are  zero. 

Lemma  10:  A  call  F0LD(b,T)  requires  0(n)  time,  where  n  is  the  size 
of  T. 

Proof;  Suppose  that  T  has  r  rows  and  t  columns.   For  each  row  w, 

the  cost  0(t)   of  executing  the  loop  of  lines  (8)-(10)  and  the  loop  of 

lines  (11)— (16)  is  assigned  to  w.   Line  (7)  is  executed  at  most  once  for 

each  row  and,  therefore,  the  total  cost  of  lines  (7)- (16)  is  0(rt).   The 

cost  of  finding  all  the  rows  that  contain  d  in  line  (6)  is  assigned  to 

d.   Let  cost(d)  be  that  cost.   For  each  symbol  we  use  a  linked  list  that 

points  to  all  the  rows  containing  that  symbol,  and  since  each  d   is  put 

on  QUEUE  at  most   once,  £cost(d)  -  0(n).   Thus,  the  total  cost  of  the 

d 
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procedure  FOLD(b,T): 
begin 

arra^  h[l :m] ; 

/*  Initially  QUEUE  is  empty  */ 

(1)  M  :«  <(.; 

(2)  h  :-  0; 

(3)  add  b  to  QUEUE; 

(4)  while  QUEUE  is  not  empty  do 

begin 

(5)  find  and  delete  d,  the  first  element  of  QUEUE; 

(6)  for  each  row  w  that  contains  d,  and  w  fc   M  do 

beein 

(7)  add  w  to  M; 

(8)  for  each  repeated  nondistinguished 

variable  e  that  appears  in  c(w)  do 

(9)  if  h(e)  -  0  then  h(e)  :=  e 

(10)  Sise  i£  h(e)  *  e  then  return  <f>; 

(11)  for  each  column  A  of  w  that  has  a  repeated 

nondistinguished  variable  e  do 
begin 

(12)  let  s  be  the  symbol  of  c(w)  in  column  A; 

(13)  if  h(e)  =  0  then  h(e)  :=  s; 

(14)  if  h(e)  *  s  then  return  <j»; 

(15)  if  h(e)  *   e,~~ 

and  e  has  not  been  on  QUEUE 

(16)  then  add  e  to  QUEUE; 

end; 

end; 
end; 

(17)  if  h(b)  =  b  then  rjsturn  <\>   else  return  M; 
end  FOLD; 

begin  /*  main  procedure  */ 

/*  Initially  all  the  variables  are  marked  "unconsidered".  */ 

(18)  while  there  is  a  repeated  nondistinguished 

variable  b  marked  "unconsidered"  do 
begin 

(19)  mark  b  "considered"; 

(20)  M  :=  F0LD(b,T); 

(21)  delete  all  the  rows  of  M  from  T; 
end; 

(22)  return  T; 
end 


Figure  5 

outer  loop  (lines  (4)-(16))  is  0(n) .   Finally,  note  that  lines   (l)-(3) 
and  line  (17)  have  a  total  cost  of  no  more  than  0(n).    [] 
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Theorem  11:  A  tableau  T,  in  which  each  row  is  covered  by  at  most 

2 
one  row  besides  itself,  can  be  minimized  in  time  0(n  ). 


Proof:  Suppose  that  T'  is  the  result  of  applying  the  algorithm  of 
Figure  5  to  T.  If  a  variable  b  is  not  essential  in  T' ,  then  it  is  not 
essential  in  T  or  in  any  tableau  obtained  from  T  by  deleting  some  redun- 
dant rows  (because  T  and  T'  are  equivalent).  Thus,  all  the  rows  con- 
taining b  are  deleted  in  line  (21),  and  hence  T'  is  regular.   By  Lemma 

2 
10,  the  running  time  of  this  algorithm  is  0(n  ).    [] 


.6.  _4  Simple  Tableaux 

Suppose  that  T  is  a  simple  tableau.  Let  S  be  a  set  of  rows,   and 

let  w  be  a  row  of  T.    The  closure  of  S  with  respect  to  w,  denoted 

CL  (S),  is  the  minimal  set  of  rows  such  that 
w 

(1)  S  C  CL  (S),  and 

w 

(2)  if  x  is  a  row  in  CL  (S)  such  that  x  has  a  repeated  nondistinguished 

w 

variable  b   in  some  column  A,  and  w  has  some  other  symbol  in  this 

column,  then  all  the  rows  containing  b  are  in  CL  (S). 

w 

In  [3]  it  is  shown  that  if  w  covers  CL  (S),  then  the  tableau  obtained  by 

w 

deleting  all   the   rows   of  CL  (S)  -  w  is  equivalent  to  T  (this  is  true 

w 

even  if  T  is  not  simple).   Furthermore,   if  some  repeated  nondis- 
tinguished variable  b  of  T  is  not  essentail,  then  there  is  a  row  w  (that 

does  not  contain  b)  such  that  w  covers  CL  (W(b) ) .   (Note  that  w  is  not 

w 

in  CL  (W(b)),  since  w  is  not  in  W(b).) 
w 
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These  results  can  be  used  to  obtain  a  regular  tableau  equivalent  to 

T  as  follows.   Compute  CL  (W(b))   for  each  repeated  nondistlngulshed 

w 

variable  b  and  each  row  w  that  does  not  contain  b.    If  w  covers 

CL  (W(b)),   then  delete  all  the  rows  of  CL  (W(b) )  from  T.   The  resulting 
w  w 

tableau  is  equivalent  to  T,  and  it  is  regular  because  all  repeated  non- 
distinguished  variables  that  are  not  essential  have  been  eliminated  [3]. 

In  this  section  we  describe  an  implementation  of  this  algorithm  that 

2 
runs  in  0(n  )  time. 


Lemma  12;  Let  w  be  a  row  of  a  simple  tableau  T.   Suppose  that  b. 

and  b«  are  two  repeated  nodistlnguished  variables  of  T  that  do  not  occur 

in  w.   Then  CL  (W(b,))  and  CL  (W(b0))  are  either  equal  or  disjoint, 
w     I  W     i. 

Proof;  Let  S.  =  CL  (W(b,))  and  S0  =  CL  (W(b0)).  Suppose  that  some 
row  x  belongs  to  both  S.  and  S_.  Row  x  must  have  some  repeated  nondis- 
tlngulshed variable  b  that  does  not  occur  in  w  and,  hence,  W(b)  is  con- 
tained  in  both   S.   and  S„.   We  claim  that  both  S.  and  S?  are  equal  to 

CL  (W(b)).   By  the  definition  of  a  closure,  if  R  is  a  subset  of  CL  (S) 
w  w 

(for  any  row  w  and  set  of  rows  S),  then  CL  (R)  C  CL  (S).   Thus,  both  S. 

w        w  1 

and  S.  contain  CL  (W(b) ) . 

L  W 

Let  S  ■  CL  (W(b)).   Suppose  that  S.  is  not  equal  to  S.   If  some  row 
w  1 

of  W(b.  )  is  in  S,  then  all  the  rows  of  W(b.  )  are  In  S,  and  S  is  equal  to 

S.  (because  it  satisfies  the  definition  of  CL  (W(b,))).   Thus  W(b,  )  must 
i  w    1  I 

be  contained  in  S.  -  S.  We  will  derive  a  contradiction  by  showing  that 
S.  -  S  satisfies  also  the  second  condition  of  CL  (W(b.)).  Let  u  be  any 
row  of   S.  -  S.    Since  u  is  in  S.,  u  has  a  repeated  nondistlngulshed 
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variable  b  In  some  column  A,  and  w  has  some  other  symbol  in  that  column. 

If  some  row  of  W(b)  is  in  S,  then  all  the  rows  of  W(b)  must  be  in  S,  and 

u  cannot  be  in  S.  -  S.   Thus,  W(b)  is  disjoint  from  S  and,   hence,   all 

the  rows  containing  b  are  in  S.  -  S  (since  they  are  in  S.).   Therefore, 

S.  -  S  satisfies  also  the  second  condition  of  CL  (W(b,)).   Since  this  is 
i  w    i 

impossible,  it  follows  that  S.  is  equal  to  S.  Similarly,  S«  is  equal  to 
S.    [] 

Lemma  12  implies  that  if  CL  (W(b) )  has  been  computed,  and  b   is  a 

w 

repeated   nondistinguished   variable  that  appears  in  some  row  of 

CL  (W(b)),  then  there  Is  no  need  to  compute  CL  (W(b))   (it   is  assumed 
w  w 

that  neither  b  nor  b  occur  in  w) .  Thus,  for  each  row  w  we  do  the  fol- 
lowing. At  first  all  repeated  nondistinguished  variables  that  do  not 
appear  in  w  are  marked  "unconsidered".  The  next  step  is  to  compute 
CL  (W(b))  for  some  repeated  nondistinguished  variable  that  is  marked 
"unconsidered".   During  this  step  all  repeated  nondistinguished  variabes 

that  occur  in  some  row  of  CL  (W(b) )  are  marked  "considered".    If 

w 

CL  (W(b) )   is  covered  by  w,  then  all  the  rows  of  CL  (W(b) )  are  deleted, 
w  w 

This  step  is  repeated  for  some  other  variable  that  is  marked  "uncon- 
sidered", until  all  the  variables  are  marked  "considered".  The  complete 
algorithm  is  described  in  Figure  6. 

2 
Theorem  13:  A  simple  tableau  T  can  be  minimized  in  0(n  )  time. 

Proof:  By  Lemma  12  and  the  results  of  [3] ,  the  algorithm  of  Figure 
6  returns  a  regular  tableau  equivalent  to  T.  Consider  the  time  complex- 
ity of  this  algorithm.  We  assume  that  T  has  r  rows  and  t  columns,  and  n 
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procedure  CLOSURE (b, w) : 
begin 

(1)  S  :«  4.; 

(2)  make  QUEUE  empty; 

(3)  mark  b  "considered"; 

(4)  add  all  the  rows  containing  b  to  QUEUE; 

(5)  while  QUEUE  is  not  empty  do 

begin 

(6)  let  v  be  the  first  row  on  QUEUE; 

(7)  move  v  from  QUEUE  to  S; 

(8)  for  every  column  A  do 

(9)  if  v  and  w  disagree  in  column  A,  v  has  a  repeated 

nondistinguished  variable  d  in  this  column, 
and  d  is  marked  "unconsidered"  then 
begin 

(10)  mark  d  "considered"; 

(11)  for  every  row  x  containing  d  do 

(12)  ~~~  if  x  is  neither  in  S  nor  on  QUEUE 

(13)  ~~  then  add  x  to  QUEUE; 

end; 

end; 

(14)  return  S; 
end 

begin  /*  main  procedure  */ 

(15)  for  every  row  w  do 

begin 

(16)  mark  all  repeated  nondistinguished  variables  "unconsidered"; 

(17)  for  every  repeated  nondistinguished  variable  d 

that  occurs  in  w  do 

(18)  mark  d  "considered"; 

(19)  w^liiS,  there  is  a  repeated  nondistinguished  variable  b 

marked  "unconsidered"  do 
begin 

(20)  ~R  :=  CLOSURE (b,w); 

(21)  if  R  <  w  then  delete  all  the  rows  of  R  from  T; 

end; 

end; 
(2  2)    reJturn~T; 
end 


Figure  6 

is  the  size  of  T  (i.e.,  n  ■  0(rt)).  At  first  we  will  show  that  a  call 
CLOSURE (b,w)  requires  0(st)  time,  where  s  is  the  number  of  rows  in 
CL  (W(b)).   Consider  the  cost  of  executing  the  loop  of  lines   (11)-(13). 
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We  assume  that  for  every  variable  in  T  there  is  a  linked  list  that 

points  to  all  the  rows  containing  this  variable.   These  lists  can  be 

created  in  0(n)  time  prior  to  the  execution  of  the  algorithm.  By  using 

these  lists,   the  cost  of  finding  all  the  rows  containing  d  in  the  loop 

of  lines  (11)- (13)  is  proportional  to  the  number  of  these  rows.   The 

test  of  deciding  whether  a  row  is  in  S  or  on  QUEUE  can  be  implemented  in 

constant  time.   Thus,  the  cost  of  lines  (11)-(13)  is  accounted  for  by 

assigning  a  constant  cost  to  each  row  w  whenever  w  is  examined  in  this 

loop.  A  row  is  examined  in  this  loop  no  more  than  t  times,  since  it  has 

at  most  t  variables.  Also  note  that  if  a  row  is  examined  in  this  loop, 

then  it  belongs  to  CL  (W(b)).   Therefore,   the  total  cost  of  lines 

w 

(11)-(13)  is  0(st). 

Consider  now  the  cost  of  executing  the  loop  of  lines  (5)- (13)  once 
(excluding  the  cost  of  lines  (11)- (13)).  Lines  (6)  and  (7)  have  a  con- 
stant cost.  The  cost  of  lines  (8)-(10)  is  0(t) .  Since  the  loop,  of 
lines  (5)- (13)  is  repeated  s  times,  the  total  cost  of  this  loop  is 
0(st).  Lines  (l)-(4)  have  a  cost  of  0(s) .  Thus,  a  call  CLOSURE(b,w) 
requires  0(st)  time. 

We  now  compute  the  cost  of  executing  the  loop  of  lines  (15)- (21) 
once.  The  cost  of  line  (16)  is  0(t)  (since  each  column  of  a  simple 
tableau  has  at  most  one  repeated  nondistinguished  variable) .  The  loop 
of  lines  (17)—  (18)  requires  0(t)  time.  The  cost  of  line  (19)  is  no  more 
than  the  number  of  repeated  nondistinguished  variables,  i.e.,  0(t) .  The 
cost  of  executing  lines  (20)- (21)  once  is  0(st),  where  s  is  the  number 
of  rows  in  R.   For  each  row  w,  all  the  sets  obtained  as  a  value  of  R  in 


-  31  - 


line  (20)  are  pairvd.se  disjoint.   Thus,  the  cost  of  executing  the  loop 

of  lines   (15)-(21)   once  is  0(rt) .   This  loop  is  repeated  r  times  and, 

2  2 

hence,  the  total  cost  is  0(r  t) ,  i.e.,  no  more  than  0(n  ). 


Theorem  14:  If  T.  and  T„  are  simple  tableaux,  then  testing  whether 

2 
T.  is  equivalent  to  T„  can  be  done  in  0(n  )  time. 

Proof:  By  using  the  algorithm  of   Figure  6,  we  compute  regular 
tableaux  T.   and  T„  equivalent  to  T.  and  T„,  respectively.   It  follows 

from  the  results  of  [3]  that  testing  whether  T.  is  equivalent  to  T.   can 

2 
be  done  in  0(n  )  time,  where  n  is  the  size  of  T.  and  T-.    [] 


_7.  Decomposition  of  Tableaux 

Let  T  be  a  tableau  that  does  not  necessarily  belong  to  one  of  the 
classes  we  have  discussed  so  far.  None  of  the  three  minimization  algo- 
rithms can  minimize  T  in  polynomial  time,  but  they  can  be  used  as 
heuristics.  The  minimization  algorithm  for  simple  tableaux  can  be 
applied  to  any  tableau,  and  the  result  is  an  equivalent  tableau  possibly 
with  fewer  rows.  The  idea  behind  the  algorithm  of  Section  6.2  can  be 
used  to  minimize  tableaux  as  follows.  Let  b  be  a  repeated  nondis- 
tinguished  variable  of  a  tableau  T,  and  suppose  that  the  set  S  of  all 
the  rows  containing  b  does  not  have  any  other  repeated  nondistinguished 
variable.  Then  all  the  rows  of  S  can  be  deleted  if  they  are  covered  by 
a  set  of  rows  that  have  the  same  symbol  in  the  column  of  b. 
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The  algorithm  of  Section  6.3  is  not  only  good  as  a  heuristic,  but 
can  also  be  used  as  an  exponential  time  minimization  algorithm  for 
tableaux.  Let  T  be  a  tableau.  For  every  row  i  of  T,  we  define  C(i)  to 
be  the  set  of  all  the  rows  that  cover  row  i,  i.e., 

C(i)  ■  <j  |  j  *  i  and  row  j  covers  row  i> 
Suppose  we  construct  a  function  c  such  that  for  all  i,  c(i)  e  C(i)  (it 
is  understood  that  c(i)  ■  i  if  C(i)  *  <f)  •  Using  c  we  can  apply  the 
algorithm  of  Section  6.3  as  a  heuristic.  The  number  of  all  possible  c's 
is  exponential  only  in  the  number  of  rows  i  for  which  C(i)  has  more  than 
one  element.  If  we  execute  the  algorithm  once  for  each  possible  c  we 
are  guaranteed  to  minimize  T,  since  at  least  one  of  these  c's 
corresponds  to  a  folding  from  T  to  its  core. 

The  following  approach  can  be  used  to  further  reduce  the  exponen- 
tial factor  in  the  running  time  of  this  algorithm.  We  define  a  relation 
R  on  the  rows  of  a  tableau  T  as  follows.  For  rows  x  ,and  y  of  T,  xRw  if 
and  only  if  x  and  w  have  the  same  nondistinguished  variable  in  some 
column.  Obviously,  R  is  symmetric  and  reflexive.  Let  P.,P_, ...,P  be 
the  equivalence  classes  of  the  transitive  closure  of  R. 

Lemma  15;  Let  6  be  a  containment  mapping  from  a  tableau  T  to 

itself,  and  let  k  (Kk<q)  be  a  fixed  integer.   Define 

5(x)  =  6(x)     if  x  e  R 

k 

£(x)  =  x        otherwise 
Then  5  is  a  containment  mapping. 
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Proof;  Obviously,  for  all  rows  x,  5(x)  covers  x.  Suppose  that  rows 
x  and  w  have  the  same  nondlstlngulshed  variable  In  column  A.  Thus, 
either  both  w  and  x  are  in  P,  or  both  are  not  in  P,  .  In  either  case, 
5(x)  and  £(w)  have  the  same  symbol  in  column  A.    [] 

Lemma  15  implies  that  when  the  algorithm  of  Section  6.3  is  applied 
as  a  heuristic,  we  have  to  consider  only  c's  such  that  for  some  fixed  k, 
c(i)  e  C(i)  if  row  i  is  in  P.;  otherwise  c(i)  -  i.  If  among  all  the 
P  's,  P  has  a  maximum  number,  say  m,  of  rows  for  which  C(i)  contains 
more  than  one  element,  then  the  number  of  all  possible  c's  is  exponen- 
tial only  in  m.  For  each  P  there  is  at  least  one  possible  c  that  maps 
all  redundant  rows  in  P  to  the  core  of  T.  Thus,  no  c  has  to  be  con- 
sidered more  than  once. 


8_ .  Synthesis  of  Expressions  from  Tableaux 

Optimal  expressions  can  be  synthesized  from  simple  tableaux  in 

2 
0(n  )   time   [5].    Suppose  we  are  given  a  tableau  T.   The  optimization 

process  is  done  in  two  steps. 

(a)  Minimize  the  number  of  rows  in  T. 

(b)  Produce  from  T  an  expression  in  which  select  and  project  are  applied 
as  early  as  possible. 

The  approach  taken  in  the  second  step  was  found  useful  by  previous  work- 
ers (e.g.,  [12,17])  in  the  field  of  expression  optimization,  and  can  be 
viewed  as  our  "cost  function." 
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In  order  to  obtain  an  optimal  expression  for  a  minimal  tableau  T, 
[5]  uses  the  following  approach.  At  first  all  constants  are  deleted 
from  the  summary  of  T.  Then  an  optimal  expression  E  is  synthesized. 
Finally  a  new  operator  augment,  defined  by  a   (r)  ■  {u  |  u(A)  «  c  and 

A  C 

there  exists  v  in  r  such  that  for  all  attributes  B  in  the  relation 
scheme  of  r,  u(B)  =  v(B) },  is  applied  to  E  to  introduce  the  constants 
that  were  deleted  from  the  summary  of  T. 

Suppose  we  decompose  a  tableau  T  into  equivalence  classes 
P., P., ...,P  as  described  in  Section  7,  and  let  s  be  the  summary  of  T. 
For  every  j,  we  can  define  a  tableau  T  that  have  the  same  rows  as  P. 
and  a  summary  as  follows.  For  each  column  A,  if  s  has  a  nonblank  symbol 
in  column  A  and  that  symbol  appears  also  in  column  A  of  P.,  then  the 
summary  of  T   has  the  same  symbol  in  column  A;  otherwise,  it  has  a 

q 

blank.   It  follows  that  T  =  W  T  .   Thus,  T  comes  from  an  expression  if 

J-l  J 
and  only  if  each  T.   comes  from  an  expression.   Suppose  that  each  T 

comes  from  the  expression  E  .   Then  T  corresponds  to  the  expression 

q 

E  =  M  E  .   If  T  is  a  minimal  tableau,  we  can  get  an  optimal  expression 

j-l  J 

for  T  as  follows.   At  first  all  constants  are  deleted  from  the  summary 

of  T.    Then  we  decompose  T  and  find  an  optimal  expression  E  for  each 

q 

T  .   (Note  that  each  T  is  minimal.)  Let  E  -  H  E  .   By  applying  augmen- 
J  J  1=1  ^ 

tation  to  E  we  get  an  optimal  expression  for  T. 


If  T  is  a  tableau  with  at  most  one  repeated  nondistinguished  vari- 
able in  each  row,  then  each  T  is  a  simple  tableau  (because  it  has  at 
most  one  repeated  nondistinguished  variable).  Thus,  we  can  synthesize 
an  optimal  expression  for  T  in  polynomial  time.   However,  if  T  is  a 
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tableau  in  which  every  row  is  covered  by  at  most  one  row  besides  itself, 
then  testing  whether  T  comes  from  an  expression  is  NP-complete. 

Theorem  13:  Let  T  be  a  tableau  in  which  every  row  is  covered  by  at 
most  one  row  besides  itself.  The  problem  whether  T  corresponds  to  a 
restricted  relational  expression  is  NP-complete. 

Proof;  The  problem  of  testing  whether  a  tableau  T  corresponds  to  an 
expression  (shown  NP-complete  in  [19])  is  reduced  to  this  problem  as 
follows.  Let  T  be  a  tableau,  and  let  G  be  a  new  attribute.  We  con- 
struct a  tableau  T'  by  adding  to  T  a  new  column  that  corresponds  to  G. 
The  summary  of  T'  has  a  blank,  and  row  i  has  the  constant  i  in  this 
column.  We  claim  that  T  corresponds  to  an  expression  if  and  only  if  T' 
corresponds  to  an  expression. 

If .  Let  E'  be  an  expression  for  T' .  Delete  G  from  each  relation 
scheme  in  E',  and  delete  every  selection  operator  that  is  applied  to  the 
attribute  G.  Each  projection  operator  tt_.  that  appears  in  E'  is  replaced 
with  Tr  /nx.  Let  E  be  the  resulting  expression.  It  is  easy  to  show 
that  E  corresponds  to  T. 


Only  if.  Let  E  be  an  expression  for  T.  Let  R.  be  the  relation 
scheme  that  corresponds  to  row  i  of  T,  and  define  R  ■  R  U  {G}  for 
every  i.   An  expression  E'  for  T'  is  obtained  from  E  by  replacing   each 

\  with  YWV'- 


(1)  See  [2,11]  for  an  exposition  of  NP-completeness  and  related  topics, 
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Since  each  row  of  T'  is  covered  by  no  row  besides  itself,  and  an 
expression  for  T  can  be  obtained  from  an  expression  for  T'  in  polynomial 
time,  the  proof  is  complete.    [] 
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